Data Extraction
Extract structured data from pages using CSS or XPath selectors.
Basic extraction
result = client.scrape(
"https://quotes.toscrape.com/",
extract={
"title": "css:h1",
"first_quote": "css:.text",
},
)
print(result.extracted_data["title"])
print(result.extracted_data["first_quote"])
Multiple values
Use multiple: True to extract all matching elements as a list:
result = client.scrape(
"https://quotes.toscrape.com/",
extract={
"quotes": {"selector": "css:.text", "multiple": True},
"authors": {"selector": "css:.author", "multiple": True},
},
)
for quote, author in zip(result.extracted_data["quotes"], result.extracted_data["authors"]):
print(f"{quote} — {author}")
Extract attributes
Extract element attributes like href, src, data-*:
result = client.scrape(
"https://quotes.toscrape.com/",
extract={
"links": {"selector": "css:a", "attribute": "href", "multiple": True},
"images": {"selector": "css:img", "attribute": "src", "multiple": True},
},
)
XPath selectors
result = client.scrape(
"https://quotes.toscrape.com/",
extract={
"quotes": "xpath://span[@class='text']",
"authors": "xpath://small[@class='author']",
},
)
Extraction + browser
Works with browser rendering for JS-generated content:
result = client.scrape(
"https://spa-app.com/products",
browser=True,
extract={
"names": {"selector": "css:.product-name", "multiple": True},
"prices": {"selector": "css:.price", "multiple": True},
},
)
When to use extract vs EvaluateAction
| Use case | Tool |
|---|---|
| Data is visible in the HTML/DOM | extract (CSS/XPath selectors) |
| Data comes from JS variables | EvaluateAction |
| Data comes from internal APIs | EvaluateAction |
| Complex DOM logic needed | EvaluateAction |